Introduction 


Students run single-node openstack installations. 

First we tell students about the rules of the session: 

1. | will be logging through ssh and breaking stuff 

2. Each time | break something, you spawn a vm and see what is happening 

3. For each break, we take around 5 minutes to find where the error is. 

4. We try to find ways to fix this. 

5. If no solution is found, | will be showing how to fix with a given students laptop. 

6. An ad-hoc document will be produced with all the problem resolutions proposed by students 


Where to seek for potential errors 


Students need to be informed about potential areas where to seek for problem resolution: 


1. Show the devstack screen 

2. “nova” database 

3. iptables 

4. rabbitmg stats (“rabbitmqctl list_queues’”, “rabbitmq list_consumers”) 
5. nova show (show details about vm state) 

6. grep for “nova” “glance” “keystone” “rabbitmq’” in processes 

7. libvirt tools (“virsh list”) 

Breaks 


stopped nova-api - CLI commands are not accepted - start nova-api process 
- dashboard does not respond 
- we cannot telnet to nova-api server on port 
8774 


stopped nova-scheduler | - the instance is stuck in the “scheduling” state - start nova-scheduler process 
- user can perform tasks other than instance 
creation (e.g. terminate instances) 
- message queue “scheduler” has non-zero 
message count 


stopped nova-network - instance stuck in “networking” state - start nova-network process 


stopped nova-compute - instance stuck in “scheduling” state - start nova-compute process 
- outstanding message in compute.lab-X queue 
- no consumer compute.lab-X in rabbitmq 


stopped keystone - the CLI returns error 400 upon any request - start keystone process 
- dashboard cannot display items 
- cannot connect on port 5000 to keystone 


stopped glance-api - unable to display image list with a cli (error - start glance-api 
500) 


- dashboard shows “unable to retrieve list of 
images” 


stopped rabbitmq - CLI hangs on booting the vm - start rabbitmq-server (rabbitmq 
- nova-api outputs a trace with [Errno 111] does not output anything to screen - 
ECONNREFUSED it's an ordinary linux daemon) 


no more fixed ips to - instance gets error on “networking” phase - create a new network for a tenant 
allocate. - nova-network returns error “No more fixed ips” | (we will specifically need to put 

(to produce: “mysql - instance_id field in fixed_ips table for given further instances into this net with -- 
-p nova -e "update network has all values set to something other nic net-id=<new_net_id> 

fixed_ips set than NULL 

instance_id=3") 


stopped libvirt-bin - instance has error in “spawning” state start libvirt-bin (from ordinary root 
- nova-compute shows “Failed to connect to console, not from devstack screen) 
socket /var/run/libvirt/libvirt-sock” 


instance got destroyed - instance is unreachable but its status is - grab instance data in libvirt (“nova 

ACTIVE show <inst.ID>” - instance_name’, 
- “virsh list” on the compute where the instance host 
is located does not show the instance - go to host via ssh 

- list vm-s in virsh - there is 

no “instance_name” running 

- go to /etc/libvirt/qemu/ 

- virsh create ./instance_name.xml 


flushed NAT rules - user can ssh to instance via fixed ip - restart nova-network 
on nova-network - when he tries to ssh on floating ip, he 
host (for instances gets “remote host identification changed” 
with floating ips). To warning 
produce: “iptables -F -t This is because the floating ip now points to 
nat” nova-network host. 
- he can’t get in via ssh with his user/pass 
- iptables shows no rules in NAT table 


vlan bridge ip deleted - user has 2 instances on different networks - re-add an IP to the bridge 
- there is a secgroup which lets all traffic pass - sometimes network restart needs 
between those networks, but the instances on to be done on the instance to pick 
both networks can’t see each other up ip again 
- administrator cannot access instances in this 
net directly from the hypervisor 
- “ip addr show br<X>” on bridges 
corresponding to networks shows that there is 
no ip set on one of them 


dnsmasq not running on | - after reboot instances do not come up again - restart nova-network 
the hypervisor - they are stuck at getting address from dhcp 

server 

- console log shows that it hangs on network 

configuration 


